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Optimal  Sequential  File  Search 

George  E.  MONAHAN 

Department  of  Business  Administration 
University  of  Illinois  at  Urbana-Champaign 

1.  Introduction 

This  paper  analyzes  a  dynamic  problem  in  decision- making  under  uncertainty  in  which 
the  decision-maker  must  determine  whether  or  not  to  purchase  costly  information.  A 
sequential  computer  file  consists  of  records  that  have  been  randomly  selected  from  a  known, 
finite  population  of  records.  The  selected  records  are  stored  in  sequential  order  on  the  basis 
of  the  value  of  some  key  that  is  contained  in  each  record.  A  request  is  made  regarding 
the  status  of  a  particular  record  in  the  population.  The  sequential  file  search  problem  is 
to  determine  the  status  of  the  requested  record.  Is  the  record  in  the  file?  If  it  is  not, 
where  should  it  be  placed  so  that  the  sequential  order  is  preserved?  It  is  possible  to 
gather  information  about  the  contents  of  the  file  by  examining  one  or  more  of  its  positions. 
Obtaining  this  information  is  costly,  however.  Based  upon  all  of  the  information  that  is 
available,  a  search  strategy  specifies  the  positions  of  the  file  to  examine  and  a  disposition 
strategy  specifies  whether  or  not  the  requested  record  is  in  the  file  and  if  it  is  not  where 
it  would  placed  if  it  were  added  to  the  file.  A  reward  is  earned  only  if  the  status  of  the 
record  is  correctly  determined.  An  optimal  search  and  disposition  strategy  maximizes  the 
expected  value  of  this  terminal  reward  less  the  total  cost  of  acquiring  information. 

If,  for  example,  the  cost  of  search  relates  to  the  time  required  to  determine  the  status 
of  a  record  in  the  population,  the  optimal  search  and  disposition  strategy  then  specifies 
how  the  file  should  be  searched  so  as  to  minimize  the  expected  length  of  time  required  to 
determine  whether  or  not  the  requested  record  is  in  the  file.  Wiederhold  (1977)  describes 
several  techniques  for  searching  a  sequential  file  but  makes  no  assumptions  regarding  the 
likelihood  of  certain  records  being  in  the  file.  The  model  developed  and  analyzed  here  can 
be  viewed  as  a  Bayesian  version  of  the  binary  search  and  probing  schemes  he  discusses.  The 
relationship  between  the  problem  studied  here  and  other  search  problems  in  the  computer 
science  literature  is  discussed  in  the  next  section. 

A  more  specific  statement  of  the  sequential  file  search  problem  is  now  given.  There  are 
n  records  in  the  population,  labelled  by  "keys"  that  we  assume  are  real  numbers  rj, ,rn. 


For  convenience,  positions  within  the  file  are  called  "boxes"  and  are  labelled  Box  1,  . . ., 
Box  n.  Nature  determines  the  contents  of  the  file:  record  rz  is  in  one  of  the  n  boxes 
with  probability  pi  and  its  inclusion  is  independent  of  the  inclusion  of  other  records  in  the 
population.  Let  iV  be  the  random  variable  that  denotes  the  total  number  of  records  stored 
in  the  n  boxes.  These  N  records  that  actually  constitute  the  file  are  ordered  according 
to  their  keys:  if  ij  denotes  the  subscript  of  the  jth  record  in  the  file,  j  =  1, . . .  ,  iV,  then 
record  rfj  is  in  Box  j  and  r^  <  . . .  <  r,N .  We  wish  to  determine  if  some  record,  say 
rjfc  for  some  1  <  k  <  n,  is  in  the  file.  Initially,  all  that  we  know  about  the  file  and  its 
contents  are  the  probabilities  p,-,  i  =  1, . . .  ,n  that  the  various  records  are  in  the  file.  To 
assist  us  in  the  determination  of  the  status  of  r*,  we  can  gather  additional  information 
regarding  the  contents  of  the  file  by  examining  the  contents  of  any  or  all  of  the  boxes.  The 
cost  to  examine  one  box  and  determine  its  contents  with  certainty  is  $c.  Without  loss  of 
generality,  we  assume  0  <  c  <  1. 

After  gathering  information,  we  must  declare  that  either  r*  is  in  the  file  or  it  is  not.  If 
we  declare  that  record  rjt  is  not  in  the  file,  we  must  also  indicate  where  in  the  file  it  would 
be  added.  In  other  words,  we  must  indicate  what  "gap"  the  record  is  in.  We  say  record  r^ 
is  in  gap  j  if  ij  <  k  <  ij+i,  for  some  j  ==  0, . .  • ,  n  —  1.  Without  loss  of  generality,  we  receive 
a  reward  of  $1  if  we  are  correct  in  our  assessment  regarding  the  status  of  r*.  If  we  are 
incorrect,  we  receive  nothing.  (Any  values  could  be  chosen  for  the  rewards  and/or  costs 
associated  with  making  both  correct  and  incorrect  decisions.  The  values  used  here  are 
particularly  convenient.)  The  objective  is  to  determine  both  a  search  strategy  that  tells 
us  which  boxes  to  examine,  as  well  as  a  disposition  strategy  that  specifies  either  that  the 
record  is  in  the  file  or  the  gap  in  which  it  falls.  An  optimal  search  and  disposition  strategy 
is  one  that,  given  the  initial  probabilities  regarding  the  contents  of  the  file,  maximizes  the 
expected  terminal  payoff  less  total  search  cost. 

As  in  most  problems  concerning  sequential  decision-making  under  uncertainty,  the 
optimal  search  and  disposition  strategy  seeks  to  achieve  an  economic  balance  between  short 
and  long-term  rewards.  The  acquisition  of  information  is  costly,  but  better  information 
improves  the  likelihood  that  the  disposition  decision  results  in  higher  expected  payoffs. 
In  general,  these  strategies  are  complex  functions  of  all  the  information  that  is  available 
at  a  given  stage  of  the  decision  process.  We  model  the  search  problem  as  a  Markov 
decision  process  (MDP)  and  show  how  to  compute  optimal  expected  payoffs  and  an  optimal 


search  and  disposition  strategy.  The  model  formulation  allows  us  to  do  a  fairly  extensive 
analysis  of  how  these  payoffs  and  strategies  depend  upon  parameters  of  the  problem.  One 
particularly  interesting  result  we  establish  is  that  under  certain  conditions,  if  it  is  initially 
optimal  to  examine  one  box,  then  it  is  optimal  to  continue  the  examination  of  boxes  until 
all  uncertainty  has  been  resolved. 

The  paper  is  organized  as  follows.  Section  2  relates  the  problem  posed  here  with  other 
forms  of  search  problems  in  the  economics  and  operations  research  literature  and  with  other 
file  search  problems  in  the  computer  science  literature.  Section  3  presents  the  MDP  model 
of  the  sequential  file  search  problem  and  introduces  a  small  numerical  example  that  is  used 
throughout  the  paper  for  illustration  purposes.  A  solution  procedure  for  solving  the  MDP 
is  given  in  Section  4.  Section  5  analytically  derives  several  results  that  show  how  certain 
parameters  of  the  problem  influence  both  the  optimal  search  and  disposition  strategy,  as 
well  as  the  optimal  payoff  function.  Concluding  remarks  are  in  Section  6. 

2.  Related  Literature 

There  is  an  extensive  literature  in  both  economics  and  operations  research  dealing 
with  the  search  for  hidden  objects  under  a  variety  of  informational  assumptions.  See, 
e.g.,  Stone  (1975),  for  references  to  a  large  portion  of  this  literature.  In  much  of  this 
literature,  the  objective  is  to  examine  boxes  when  the  contents  are  unknown.  The  primary 
distinction  of  the  problem  analyzed  in  this  paper  is  the  inter- dependence  resulting  from 
the  assumption  that  the  contents  of  the  boxes  are  completely  ordered.  There  are  both 
positive  and  negative  ramifications  associated  with  this  inter-dependence.  It  is  possible, 
for  example,  to  obtain  information  concerning  the  contents  of  Box  1  by  examining  Box  2,  a 
characteristic  that  can  be  beneficially  exploited.  On  the  other  hand,  the  dependence  rules 
out  the  optimality  of  "reservation  price"  strategies,  which  are  relatively  simple  rules  for 
determining  which  boxes  to  examine  and  when  to  terminate  search.  See  Weitzman  (1979) 
for  an  interesting  discussion  of  the  optimality  of  such  rules  in  a  particular  class  of  search 
problems. 

The  sequential  file  search  interpretation  given  in  the  Introduction  can  be  viewed  as 
the  converse  of  the  "typical"  file  search  problem  widely  discussed  in  the  computer  science 
literature.  In  the  classical  problem  (see,  e.g.,  Knuth  (1973)),  a  finite  universe  of  possible 
records  are  linearly  ordered  with  respect  to  a  specified  key.    A  sequential  file  contains  a 


known  subset  of  records  from  that  universe.  These  records  are  stored  on  the  basis  of  their 
key  values.  The  objective  is  to  determine  whether  an  object  drawn  from  the  universe  is 
contained  in  the  file.  This  determination  can  only  be  done  by  comparing  the  selected 
record  to  the  known  records  already  in  the  file.  In  the  probabilistic  version  of  the  problem, 
the  probability  that  an  object  that  is  randomly  drawn  from  the  population  corresponds  to 
a  particular  element  of  the  file  is  known  for  all  records  in  the  file.  The  probability  that  the 
drawn  object  lies  between  any  two  records  in  the  file  is  also  specified.  Knuth  (1971)  gives 
a  dynamic  program  that  specifies  a  strategy  for  minimizing  the  number  of  comparisons 
needed  to  determine  the  status  of  the  record. 

A  decision- theoretic  version  of  the  classical  file  search  problem  is  given  as  an  example 
in  Whinston  and  Moore  (1986,  1987).  Moore  et  al.  (1988)  analyze  this  problem  in  more 
detail  and  present  a  solution  technique  that  is  based  upon  the  analysis  of  a  decision  tree. 
The  problem  studied  in  this  paper  is  somewhat  more  complex  than  the  classical  search 
problem.  For  example,  binary  search  schemes  are  common  techniques  for  searching  a  file 
when  the  contents  are  known:  starting  at  the  middle  of  the  file,  half  of  the  file  is  eliminated 
from  consideration  on  the  basis  of  the  comparison  of  key  values.  Such  a  scheme  cannot 
typically  be  employed  for  the  file  search  problem  studied  here,  however.  Since  we  do  not 
know  how  many  records  are  in  the  file,  we  don't  even  know  where  the  middle  is!  Beginning 
a  search  at  the  [n/2]th  position  in  the  file  is  certainly  not  uniformly  optimal  since  it  ignores 
the  probabilities  of  records  being  the  file.  In  Section  5,  we  discuss  the  optimality  of  such 
a  strategy  for  a  special  case,  however. 

In  the  problem  studied  in  Moore  et  al.  (1988),  there  is  uncertainty  related  to  a 
single  item,  the  object  that  is  drawn  from  the  universe.  In  the  problem  considered  here, 
there  is  uncertainty  regarding  the  entire  contents  of  the  file.  While  both  problems  entail 
the  sequential  acquisition  of  information,  the  methodologies  used  to  determine  optimal 
behavior  differ.  The  state  space  in  Moore  et  al.  are  subsets  of  key  values  that  have  not 
yet  been  excluded  as  a  result  of  the  search  process.  Here  the  state  space  is  the  set  of 
probability  distributions  over  the  set  of  keys  of  the  file. 

The  model  developed  here  uses  the  fact  that  the  contents  of  the  file  are  stored  in  linear 
order  but,  for  expositional  purposes,  does  not  fully  exploit  this  feature  of  the  problem. 
Blair  and  Monahan  (1990)  show  how  the  state  space  of  the  model  developed  here  can  be 
greatly  reduced,  thus  diminishing  the  computational  effort  required  to  generate  an  optimal 


strategy.  The  reduction  is  somewhat  cumbersome,  however,  and  makes  it  considerably 
more  difficult  to  establish  qualitative  properties  of  the  optimal  search  and  disposition 
strategy.  In  the  next  section,  the  file  search  problem  is  formulated  as  a  Markov  decision 
process. 


3.  A  Markov  Decision  Process  Model 

Let  ai,...,an  denote  the  contents  of  Boxes  1  through  n,  respectively.  If  Box  j  is 
empty,  we  say  a,j  =  0.  We  define  the  "null  record"  as  rn+i  =  0,  so  that  by  convention, 
0  >  r}  for  all  j .  If  ad  =  0,  then  a,  =  0,  for  i  =  j '  +  1, . . . ,  n.  If  there  are  m  records  in  the 
file,  then  Boxes  1, . . . ,  m  each  contain  a  record  and  Boxes  m  +  1, . . .  ,  n  are  empty.  The 
contents  of  the  file  is  described  by  exactly  one  of  the  elements  of  B,  where 

B  =  {a  =  (ai, . . .  ,an)  |  a}  6  {n, . . .  ,rn,  0}  for  all  j  and  a,-  <  a^+i  for  i  =  1, . . .  ,  n  —  1}  . 

Given  there  are  n  distinct  potential  records  and  multiple  copies  of  a  record  are  not  permit- 
ted in  the  file  (except,  of  course,  for  rn+i ),  there  exactly  2"  elements  in  B.  For  convenience, 
let  C  =  {1, . . .  ,2n}  index  the  2n  vectors  in  B.  (Just  how  the  elements  in  B  are  indexed  is 
not  important.) 

We  know  that  the  contents  of  the  file  is  specified  by  one  of  the  2"  elements  in  C  but 
we  don't  know  which  one.  For  this  reason,  we  refer  to  the  states  in  C  as  (unobservable) 
core  states.  Since  we  know  the  probability  that  a  particular  record  is  in  the  file,  we  can 
compute  the  probability  that  each  core  state  prevails.  As  we  gather  information  regarding 
the  contents  of  some  of  the  boxes,  we  update  the  distribution  over  the  core  states  via  Bayes1 
rule.  Let  S  =  {II  =  (7Ti, . . .  ,  7r2n  )  |  0  <  7r,  <  l,for  i  =  1, . . .  ,2n  and  ^  7r2  =  1}  be  the  set 
of  probability  mass  functions  defined  on  C.  We  denote  the  state  of  the  decision  ■process 
at  stage  t  as  lit  E  S\  that  is,  lit  summarizes  all  of  the  information  that  is  relevant  for 
making  decisions  at  stage  t.  (The  MDP  being  formulated  here  is  actually  called  a  ■partially 
observable  Markov  decision  process  (POMDP),  reflecting  the  fact  that  the  core  states  are 
not  directly  observable;  for  a  discussion  regarding  the  sufficiency  of  lit  as  a  state  descriptor 
in  POMDP's,  see  Monahan  (1982).)  At  the  beginning  of  stage  t,  we  know  the  value  of 
lit  and  can  take  actions  that  fall  into  two  categories.  First,  we  can  terminate  the  search 
process  and  choose  one  of  several  stop  actions:   we  can  either  declare  that  the  record  for 


which  we  are  searching  is  in  the  file  or  we  can  declare  that  the  record  is  not  in  the  file  and 
falls  in  gap  /,  /  =  0, . . . ,  n  —  1. 

If  we  decide  not  to  stop  the  search  process,  we  must  choose  which  of  the  n  boxes  should 
be  examined  next  at  a  cost  of  $c.  After  examining  the  box,  the  probability  distribution 
Ht  is  updated  via  Bayes'  rule  to  II'  (say)  to  reflect  the  information  obtained.  The  decision 
process  then  moves  to  then  next  stage  with  the  state  specified  by  II'  and  proceeds  as  it 
did  at  stage  t. 

3.1  Problem  k 

"Problem  fc"  is  to  determine  whether  or  not  r*  is  in  the  file  for  some  k  =  1, . . .  ,  n.  We 
know  that  rjt  is  at  most  the  kth  "largest"  record  in  the  file  and  hence  can  never  be  Box 
k  +  1,  •  •  • ,  Box  n.  Furthermore,  an  examination  of  the  boxes  whose  numbers  exceed  k  will 
not  provide  any  useful  information  regarding  the  status  of  r^.  Therefore,  in  Problem  &, 
we  restrict  our  attention  to  the  first  k  boxes. 

Let  Bij  denote  the  contents  of  Box  j  when  the  core  state  is  i,  for  j  =  1, . . . ,  k  and 
i  £  C.  Let  Bio  =  0  for  all  i.  The  following  sets  are  useful  in  the  specification  of  the 
optimization  problem.  For  notational  convenience,  we  suppress  the  dependence  of  these 
sets  on  k: 

S  =  {i  £  C  |  rjt  =  Bij,  for  some  j  =  1, . . . ,  k} 
Gi  =  {i  6  C  |  rk  >  Bii  and  rjt  <  Btj+\ }     for  /  =  0, . . , ,  k  —  1 
Hjq  —  {i  €  C  |  rq  =  Bt) }     forn  +  l>q>j  and  j  =  1, . . . ,  k. 

S  is  the  set  of  values  that  index  core  states  that  have  r*  as  one  of  their  n  elements;  i.e.,  if 
iE  S,  then  r*  is  in  the  file.  For  some  index  value  ?,  /  =  0, . . . ,  k  —  1,  Gi  is  the  set  of  core 
states  for  which  r*  is  in  gap  /.  Finally,  H]q  is  the  set  of  values  that  index  core  states  such 
that  rq  is  in  Box  j . 

Notice  that  the  sets  5,  G/,  and  Hjq  are  determined  by  the  data  of  the  problem  and 
are  independent  of  the  decision  process;  i.e.,  these  sets  are  not  influenced  either  by  the 
state  or  the  action  taken  in  any  state  of  the  decision  process. 

In  the  next  subsection  we  develop  the  functional  equation  of  a  dynamic  program  that 
specifies  the  optimal  expected  reward  as  a  function  of  the  state  of  the  decision  process.  We 
then  give  a  small  numerical  example  that  illustrates  the  various  elements  of  the  model. 


3.2  Updating  the  State  Vector  via  Bayes'  Rule 

Suppose  we  examine  Box  j  and  observe  rq,  for  j  =  1, . . . ,  k  and  n  +  1  >  q  >  j .  With 
the  definition  of  Hjq,  we  define  the  probability  that  rq  is  in  Box  j  as 


Let 


<7jq{U)  =  P{rq  6    Boxj}=    J2   7r«- 


,0      iti<£Hjq 


(1) 


ef  = 


7T,      if  I  G  ifJ? 

Then,  by  Bayes'  rule,  the  posterior  distribution  incorporating  the  information  that  rq  is 
in  Box  j,  given  that  II  is  the  prior  distribution,  is 


S;,(n)  = 


1 


VS1    i  •  •  •  i  ^2n  ) 


(2) 


^g(n) 

Therefore,  if  II  is  the  state  of  the  decision  process  and  rq  is  observed  in  Box  j,  the  decision 
process  moves  to  the  state  "E.jq(U.). 


3.3  The  Optimal  Value  Function 

We  now  define  an  infinite-horizon  stochastic,  dynamic  program  that  specifies  the  op- 
timal expected  rewards  that  can  be  generated  for  any  distribution  over  C.  An  infinite 
planning  horizon  is  used  since  there  is  no  exogenous  requirement  that  the  search  process 
stop.  Since  search  costs  are  positive  (and  there  are  only  a  finite  number  of  boxes  to  search), 
an  optimal  policy  will  not  specify  that  searching  be  done  indefinitely. 

Let  V(II)  denote  the  optimal  expected  reward  that  can  be  earned  over  an  infinite 
planning  horizon  when  the  state  of  the  process  is  II  £  S.  This  function,  called  the  optimal 
value  function,  satisfies  the  following  dynamic  programming  recursion: 

expected  reward  if  we  stop  and  declare  r^ 

is  in  the  file 
expected  reward  if  we  stop  and  declare  rjt 

is  in  Gap  /,  /  =  0, . . . ,  k  —  1 
expected  reward  if  box  j  is  opened  and 

the  process  proceeds  optimally 


■es71'' 


V(II)  =  max  < 


j(n,i),i  =  i,.. 


7T, 


(3) 


where 


n+l 


j(uj)  =  -c+  £  v(sif(n))  .<rif(n). 


(4) 


Q=J 


In  (4),  note  that  if  we  observe  rq  in  Box  j  (which  happens  with  probability  ajq(Tl)),  the  new 
state  is  5jg(II))  and  the  optimal  expected  payoff  that  can  be  generated  over  the  remainder 
of  the  infinite  planning  horizon  is  Vr(5;g(Il)).  The  sum  on  the  right-side  of  (4)  is  therefore 
the  optimal  expected  payoff  that  can  be  obtained  if  one  of  the  k  boxes  is  opened,  the 
contents  observed,  the  state  updated  via  Bayes'  rule,  and  the  decision  process  proceeds  in 
an  optimal  manner. 

3.3  A  Numerical  Example  for  n  =  3 

Suppose  there  are  only  three  boxes.  Let  r\  =  1,  r2  =  3,  r$  =  9,  p\  =  0.1,  p2  =  0.2, 
p3  =  0.3,  and  c  =  0.1.  Then  B,  the  set  of  vectors  denoting  all  of  the  23  =  8  possible  values 
for  the  contents  of  the  three  boxes,  is 

B  =  {(0,0,0),  (1,0,0),  (3, 0,0),  (9, 0,0),  (1,3,0),  (1,9,0),  (3, 9,0),  (1,3, 9)}. 

In  this  case,  S  =  {II  =  (tti,  . . .  ,7r8)  :  0  <  7r,  <  1,  i  =  1, . . .  ,8  and  2ji=1 7r,  =  1}. 

The  eight  core  states  and  initial  probabilities  that  constitute  III,  the  initial  state  of 
the  decision  process,  are  numbered  and  listed  in  Table  1. 


Core 

Contents  of  Box 

State 

1             2             3 

Ili 

1 

0            0            0 

0.504 

2 

1            0            0 

0.056 

3 

3            0            0 

0.126 

4 

9            0            0 

0.216 

5 

1            3            0 

0.014 

6 

1            9            0 

0.024 

7 

3            9            0 

0.054 

8 

1            3            9 

0.006 

TABLE  1.  Core  States  and  Initial  Probabilities 
Suppose  we  wish  to  determine  if  "3"  is  in  the  file.  Then  k  =  2  since  r2  =  3  in 
this  example.  The  set  of  core  state  vectors  that  have  "3"  as  one  of  their  elements  is 
S  =  {3,5,7,8}.  Also,  since  "3"  is  in  gap  0  if  either  state  1  or  state  4  prevail,  Go  =  {1,4}. 
Similarly,  "3"  is  in  gap  1  if  either  state  2  or  6  prevail,  so  that  G\  =  {2,6}.  Finally, 
Hn  =  {2,5,6,8}  indicates  the  states  for  which  n  =  1  is  in  Box  1.  Similarly,  H\2  =  {3,7} 
indicates  that  r2  =  3  is  in  Box  1  in  states  3  and  7,  H22  =  {5,8}  indicates  that  r2  =  3  is  in 
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Box  2  in  states  5  and  8,  and  #23  =  {6,  7}  indicates  that  r3  =  9  is  in  Box  2  in  states  6  and 
7. 


4.  Solving  the  Markov  Decision  Process 

The  optimal  solution  to  the  MDP  given  in  (3)  consists  of  two  components:  the  deter- 
mination of  an  optimal  strategy  (or  policy)  that  prescribes  the  action  that  should  be  taken 
for  each  state  of  the  decision  process  and  the  explicit  determination  of  V(-),  the  optimal 
value  function. 

The  procedure  for  determining  a  solution  to  the  file  search  problem  is  done  in  two 
parts.  In  the  next  subsection,  we  discuss  a  procedure,  called  the  Valuation  Algorithm, 
that  recursively  solves  a  sequence  of  k  +  1  finite-stage  problems  to  determine  V(II)  for  any 
given  IT  E  S.  The  determination  of  the  optimal  strategy  forms  the  second  phase  of  the 
procedure. 

Given  the  data  for  a  particular  problem,  it  is  straightforward  to  generate  all  of  the 
possible  posterior  distributions  that  could  be  obtained  by  examining  any  combination  of 
boxes.  This  can  be  done  by  first  generating  all  of  the  possible  A>tuples  of  values  that  could 
be  observed  by  any  possible  search  strategy.  We  refer  to  these  k-  tuples  as  knowledge 
states,  since  at  any  stage  of  the  decision  process,  all  that  we  know  for  certain  about  the  file 
is  summarized  by  exactly  one  of  these  ^-tuples.  It  is  straightforward  to  write  a  computer 
code  that  generates  all  possible  knowledge  states  based  only  on  data  of  the  problem.  The 
knowledge  states  listed  in  Table  2  were  generated  by  such  a  routine  coded  in  BASIC. 

With  the  set  of  knowledge  states  available,  it  is  then  straightforward  to  generate  the 
set  of  all  possible  posterior  distributions  by  repeatedly  applying  each  of  the  values  in  each 
knowledge  state  to  (2).  Table  2  below  lists  all  knowledge  states  and  the  related  posterior 
distributions  for  the  numerical  example.  The  entries  in  the  columns  headed  by  "Box" 
indicate  the  known  contents  of  the  relevant  box.  A  "-"  indicates  that  the  contents  of  the 
box  is  still  unknown.  The  values  in  the  "Box"  columns  are  generated  by  the  Valuation 
Algorithm.  Notice  that  there  are  some  knowledge  states  that  are  equivalent  to  one  another, 
in  that  the  associated  posterior  distributions  are  identical.  The  algorithm  does  not  attempt 
to  generate  the  minimal  set  of  knowledge  states,  since  duplications  pose  no  conceptual 
problems.  Zero  probabilities  are  denoted  by  blank  entries. 


Knowledge 
State 

Posterior  Distribution 

Distr. 
No. 

Box 

1      2        3 

Core  State 
12             3             4             5             6             7             8 

1 
2 
3 

4 
5 
6 

7 
8 
9 

10 
11 
12 
13 
14 
15 

1      -     - 
3      -     - 
9     -    - 

0  -    - 

1  3     - 
1      9     - 

i    0 

3      9- 
3     0      - 
9     0     - 

0    0    - 

-  3     - 

-  9     - 

0    - 

0.504     0.056     0.126     0.216     0.014     0.024     0.054     0.006 
0.560                                 0.140     0.240                      0.60 
0.700                                                0.300 
1.000 
1.000 

0.700                                  0.300 
1.000 
1.000 

1.000 
1.000 

1.000 
1.000 

0.700                                  0.300 
0.308     0.692 
0.559     0.062     0.140     0.239 

TABLE  2.  Knowledge  States  and  Their  Associated  Posterior  Distribution 

The  procedure  for  explicitly  determining  the  optimal  strategy  exploits  the  fact  that 
the  state  of  the  decision  process  at  any  stage  is  the  probability  distribution  over  the  core 
states.  In  the  following  subsections,  we  describe  the  output  of  the  Valuation  Algorithm 
and  how  it  is  used  to  determine  an  optimal  search  and  disposition  strategy. 


4.1  Computing  V(II):  The  Valuation  Algorithm 

The  decision  process  only  moves  from  one  stage  to  the  next  if  a  "search"  action  is 
taken.  In  Problem  k,  there  are  only  k  boxes  that  can  possibly  be  examined.  A  solution  to 
a  k  +  1-stage  problem  therefore  corresponds  to  the  solution  to  an  infinite-horizon  problem. 
(There  is  not  an  exogenous  requirement  that  the  process  stop  after  a  certain  number  of 
stages.  Since  search  costs  are  strictly  positive,  we  know  that  no  optimal  strategy  will 
prescribe  the  continuation  of  the  decision  process  beyond  k  4-  1  stages.)  For  II  €  S,  let 
Vi(II)  be  the  optimal  expected  payoff  if  the  state  is  II  and  the  process  is  forced  to  stop  after 
t  stages  if  it  has  not  already  done  so.  Therefore,  t  =  1,2,....  is  the  maximum  number  of 
stages  remaining  in  the  decision  process.  The  finite-horizon  value  functions  Vi(-),  l^('), . . . 
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satisfy  the  following  dynamic  programming  equations: 

VJ(II)  =  max{  Y       TTi,  Y         tt,-,  /  =  0, . . . ,  *  -  1,  Jt(UJ),  j  =  1, . . . ,  *},        (5) 

where 

J«(n,j)  =  -c  +  £  V,.!  (S^(H))  •  *if (II)  (6) 

q=j 

for  *  =  1, 2, . . .  and  where  Vo(II)  =  0  for  all  II  6  S. 

For  a  given  II  £  <S,  the  Valuation  Algorithm  computes  V(II)  =  Vk+i(H),  using  (5). 
The  algorithm  computes  Vfc+i(II)  by  computing  the  values  of  the  variables  invalue,  gap- 
value,  and  boxvalue,  which  have  the  following  interpretation.  The  variable  invalue  contains 
the  expected  payoff  if  the  rjt  is  declared  to  be  in  the  file  and  the  decision  process  stops. 
The  variable  gapvalue  contains  the  expected  payoff  if  record  rjt  is  declared  to  be  in  the  gap 
whose  value  is  in  the  variable  named  gap.  Finally,  boxvalue  contains  the  expected  payoff  if 
the  box  whose  number  is  stored  in  bestbox  is  opened,  the  distribution  is  updated  accord- 
ing to  (2),  the  decision  process  moves  to  the  next  stage  and  proceeds  optimally  from  that 
point  on.  This  procedure,  coded  in  BASIC,  was  used  to  compute  VJt+i(IIi )  =  0.89,  where 
IIi  =  (0.504, . . .  ,  0.006)  is  the  initial  distribution  for  the  numerical  example.  (A  psuedocode 
version  of  this  algorithm  is  given  in  the  Appendix.)  The  values  of  some  of  the  variables 
determined  in  the  algorithm  are:  invalue  =  0.60,  gapvalue  =  0.36,  boxvalue  =  0.89,  and 
bestbox  =  1.  Therefore,  it  is  optimal  to  open  Box  1,  examine  its  contents,  update  IIi  to 
Eig(IIi)  if  q  is  observed  in  Box  1,  move  to  stage  2  and  proceed  from  there  on  optimally. 

What  action  should  we  take  at  stage  2?  The  action  we  take  at  stage  2  depends  upon 
what  we  observed  in  Box  1.  Suppose,  for  example,  that  we  observe  a  "1"  in  Box  1.  Then, 
using  (2),  n2  =  (0,0.560,0,0,0.140,0.240,0,0.60).  We  can  use  the  Valuation  Algorithm 
to  determine  Vjt+i(ll2)  and  the  optimal  action  to  take  if  the  initial  distribution  is  II2.  The 
optimality  of  this  action,  however,  depends  only  on  the  state  and  not  on  the  stage  of  the 
decision  process.  Therefore,  the  action  prescribed  by  the  algorithm  when  II2  is  the  state 
is  the  optimal  action  to  take  after  a  "1"  is  observed  in  Box  1. 

To  determine  Vfc+i(Il2),  the  algorithm  specifies  the  following  values:  invalue  =  0.60, 
gapvalue  =  0.4,  boxvalue  =  0.9,  and  bestbox  =  2.  It  is  clear  that  Vjt_|_i ( II2 )  =  0.9  and  it  is 
now  optimal  to  pay  c  =  0.1  and  open  Box  2!  The  fact  that  we  just  spent  c  =  0.1  to  examine 
Box  1  is  no  longer  relevant.  Since  it  is  optimal  to  examine  Box  2  and  we  are  looking  for 
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"3" ,  the  second  smallest  record  value,  the  remaining  contingencies  are  trivial-we  will  know 
for  certain  if  "3"  is  in  file  or,  if  it  isn't,  what  the  gap  will  be.  We  exploit  the  fact  that  the 
Valuation  Algorithm  can  be  used  to  determine  the  optimal  action  to  take  as  a  function  of 
any  distribution  on  C.  The  fact  that  we  are  computing  the  optimal  expected  payoff  for  an 
infinite- horizon  problem  when  in  fact  we  are  in  some  stage  t  is  not  relevant.  We  compute 
^Jt+iC')  when  the  decision  process  is  actually  in  stages  less  than  k  +  1,  not  to  determine 
the  absolute  magnitude  of  the  payoffs  associated  with  each  of  the  feasible  actions,  but  to 
see  which  action  yields  the  highest  payoff  given  the  information  available  at  that  stage. 


4.2  An  Optimal  Search  and  Disposition  Strategy 

We  identify  an  optimal  search  and  disposition  strategy  in  the  following  way.  For 
each  distribution  in  Table  2,  compute  V*+i(-)  and  the  value  of  relevant  variables  using 
the  Valuation  Algorithm.  Table  3  displays  the  information  generated  by  the  algorithm  for 
each  of  the  15  distributions  in  Table  2.  The  initial  distribution  is  in  Row  1.  (Notice  that 
the  knowledge  state  in  Row  1  indicates  that  none  of  the  boxes  have  been  examined.)  The 
value  of  V)t+i(-)  is  0.89,  the  optimal  expected  payoff  the  decision  process.  The  optimal 
action  is  to  examine  Box  1  since  boxvalue  =  0.89  and  bestbox  =  1.  Suppose  we  observe  a 
"1"  in  Box  1.  The  knowledge  state  is,  therefore,  (1,-,-  ),  which  is  Row  2  of  Table  2.  The 
posterior  distribution  associated  with  this  knowledge  state  is  in  Row  2  of  Table  2.  The 
action  that  yields  the  highest  expected  payoff,  given  we  begin  the  decision  process  in  this 
knowledge  state,  is  to  examine  Box  2.  (The  value  of  boxvalue  =  V\(-)  and  bestbox  =  2.) 
Suppose  that  we  observe  a  "9"  in  Box  2.  The  new  knowledge  state  is  (1,9,-),  which  is  Row 
7  of  Table  3.  We  see,  of  course  that  the  optimal  action  to  take  in  this  knowledge  state  is 
stop  the  process  and  declare  that  "3"  is  in  gap  1.  This  is  the  strategy  discussed  earlier. 

Since  every  possible  knowledge  state  is  listed  in  Table  3,  this  table  summarizes  the 
complete  contingency  plan  the  comprises  the  optimal  search  and  disposition  strategy.  No- 
tice that  in  an  alternate  (finite-state)  formulation  of  the  problem  that  has  as  its  states  the 
values  known  to  be  in  the  each  of  the  boxes  at  any  stage  of  the  decision  process  would 
constitute  a  table  much  like  Table  2.  In  this  sense,  the  effort  required  to  determine  the 
values  in  Table  2  cannot  be  avoided.  The  procedure  here  computes  this  table  and  then 
evaluates  each  of  the  possible  states. 
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Knowledge 

State 

Payoff  Values 

Distr. 

Box 

No. 

1      2        3 

invalue 

gapvalue 

gap 

boxvalue 

bestbox 

V{.) 

1 

-      -        - 

0.600 

0.360 

0 

0.89 

0.890 

2 

1      - 

0.600 

0.400 

1 

0.90 

0.900 

3 

3     -       - 

1.000 

0.000 

0 

0.90 

1.000 

4 

9     -       - 

0.000 

1.000 

0 

0.90 

1.000 

5 

0     - 

0.000 

1.000 

0 

0.90 

1.000 

6 

1      3 

1.000 

0.000 

0 

0.90 

1.000 

7 

1      9 

0.000 

1.000 

1 

0.90 

1.000 

8 

1      0        - 

0.000 

1.000 

1 

0.90 

1.000 

9 

3      9- 

1.000 

0.000 

0 

0.90 

1.000 

10 

3      0        - 

1.000 

0.000 

0 

0.90 

1.000 

11 

9      0       - 

0.000 

1.000 

0 

0.90 

1.000 

12 

0     0        - 

0.000 

1.000 

0 

0.90 

1.000 

13 

-     3       - 

1.000 

0.000 

0 

0.90 

1.000 

14 

-     9        - 

0.931 

0.069 

1 

0.90 

0.931 

15 

-    0      - 

0.551 

0.408 

0 

0.90 

0.900 

TABLE  3.  Evaluation  of  Posterior  Distributions 

There  are  several  interesting  features  of  the  computational  scheme  proposed  here. 
The  formulation  given  in  (3)  is  a  POMDP,  whose  state  space  is  the  continuum  S.  While 
there  are  methods  for  solving  infinite-horizon  POMDP's  (see  Monahan  (1982)  and  refer- 
ences therein),  they  are  not  nearly  as  efficient  as  algorithms  for  solving  finite-state  and 
action  MDP's.  The  procedure  suggested  here  effectively  reduces  the  computational  bur- 
den required  to  solve  this  particular  POMDP  to  that  of  a  finite-state,  finite-action,  and 
finite-horizon  MDP.  The  optimal  solution  to  such  an  MDP  is  easily  determined  by  gen- 
erating tables  of  payoffs  and  actions  yielding  those  payoffs  for  all  stages  in  the  planning 
horizon.  The  optimal  strategy  is  then  determined  by  the  values  in  the  tables.  See  Hillier 
and  Lieberman  (1985)  for  an  example  of  such  a  procedure.  The  method  used  here  is  a 
variation  of  this  backward  substitution  method.  The  sequence  of  activities  leading  to  the 
optimal  strategy,  however,  differs  from  conventional  methods.  In  spite  of  this  difference, 
the  overall  computational  effort  of  the  two  procedures  are  equivalent. 


5.  Analytical  Results 
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The  formulation  of  the  problem  with  the  state  of  the  decision  process  being  a  distri- 
bution over  core  states  is  particularly  amenable  to  sensitivity  analysis  that  determines  how 
changes  in  parameters  of  the  problem  influence  both  the  optimal  expected  payoff  and  an 
optimal  search  and  disposition  strategy.  In  this  section,  we  establish  several  properties  of 
these  functions.  Several  of  the  results  are  established  by  showing  that  they  hold  for  the 
finite-horizon  version  of  the  problem  given  in  (5)  and  (6)  and  that  these  properties  persist 
as  the  planning  horizon  goes  to  infinity.  (In  fact  we  know  that  for  any  t  >  k,  V^(II)  =  V<(II) 
for  any  U  £  S.)  Therefore,  properties  established  for  Vj(-)  for  every  t  also  hold  for  V(-). 

5.1  Qualitative  Results  for  the  General  Problem 

The  first  result  is  used  to  characterize  the  set  of  states  at  which  it  is  optimal  to  stop 
or  to  search. 

Proposition  1.  Vt(II)  is  convex  on  S  for  t  =  0, 1, . . .. 

PROOF:  The  result  is  easily  established  by  induction  on  t.  The  most  difficult  step  is  to 
show  that  Jt{Tl,j)  is  convex  in  II  if  Vj_i(II)  is  convex.  This  result  is  well-known,  however. 
See,  for  example,  Astrom  (1969,  Lemma  2)  or  DeGroot  (1970,  Lemma  1,  page  435). 

The  convexity  of  the  optimal  value  functions  and  the  linearity  of  the  payoffs  when  a 
stop  action  is  taken  leads  immediately  to  the  convexity  of  the  sets  of  states  at  which  it  is 
optimal  to  stop  and  declare  either  that  r^  is  in  the  file  or  that  r^  is  in  some  gap.  Let 


* — 'i£Gi 


eG, 

be  the  set  of  states  for  which  stopping  and  declaring  the  record  to  be  in  gap  /,  /  =  0, . . . ,  k— 1 
is  optimal.  Let 

sI  =  {UeS\v(U)  =  Y/     «i} 

'        ■'tfc.3 

be  the  set  of  states  such  that  stopping  and  declaring  the  record  to  be  in  the  file  is  optimal. 

COROLLARY  1.    The  sets  So, ,  I  =  0, . . . ,  k  —  1,  and  Si  are  convex. 

PROOF:  We  prove  that  Si  is  convex.  The  demonstration  that  Sg,  is  convex  is  analogous. 
For  ni,n2    6  5/  and  0  <   A   <   1,  let  8  =  (*i, . . .  ,#2«)   =  Alii  +  (1  -  A)LT2.     Then 
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t'(n)  <  AV'(Ili  )  +  (l  —  A)Vr(Il2)  =  ^,e5  ir,,  where  the  inequality  follows  from  the  convexity 
of  Vr(-)-  But  V'(I1)  >  X1jg5  ^"',  so  ecluantv  must  hold  and  II  E  5/. 

Let  V(II,  c)  explicitly  reflect  the  dependence  of  optimal  expected  payoffs  on  the  search 
cost.  The  next  result  establishes  the  intuitive  characteristic  that  optimal  payoffs  decline 
as  the  search  cost  increases  and  furthermore  that  optimal  marginal  payoffs  also  decline. 

PROPOSITION  2.   For  II  E  S,  Vt(U,  c)  is  nonincreasing  and  convex  in  c  for  t  =  0, 1, 

PROOF:  Again,  the  proof  is  by  induction  on  t.  Since  Vi(n,c)  is  independent  of  c,  the 
result  is  vacuously  true.  Assume  that  F(_i(Il,c)  is  decreasing  and  convex  in  c.  Then 
Jt{H,j,c)  =  —  c  +  £3?=j  ^-i(— j'g(n),  c)  •  <7]q{U)  is  also  convex  and  decreasing  in  c,  since 
it  is  the  sum  of  a  convex  function  of  c  and  the  weighted  average  of  convex  functions  of  c. 
Since  the  maximum  of  convex,  decreasing  functions  inherits  these  properties,  the  proof  is 
complete. 

The  characterization  of  the  optimal  payoffs  as  a  function  of  the  search  cost  leads 
immediately  to  another  intuitive  result. 

COROLLARY  2.  There  exits  a  c*  E  (0, 1),  such  that  for  c  >  c*  it  is  never  optimal  to  search 
any  box. 

We  now  establish  the  fact  that  having  perfect  information  about  the  contents  of  a  box 
cannot  diminish  optimal  expected  payoffs.  This  result,  while  interesting  in  its  own  right, 
is  also  used  in  the  next  subsection  to  characterize  the  optimal  search  strategy  for  a  special 
case.  For  convenience,  several  intermediate  definitions  and  results  are  required.  Let  cv(IT) 
be  the  expected  payoff  if  a  stop  action  is  taken  when  the  state  is  II  E  «S;  i.e., 

a(IT)  =  max { y^       7Ti,  2_\.        7T,-,  for  /  =  0, . . .  ,fc  —  l}.  (7) 

The  next  results  describe  how  payoffs  associated  with  stopping  depend  upon  the  stage 
of  the  decision  process.  The  first  result  is  standard  in  the  analysis  of  MDP  models  and  is 
stated  without  proof:  as  the  planning  horizon  lengthens,  optimal  payoffs  cannot  diminish. 
The  second  part  establishes  that  if  it  is  optimal  stop  when  at  most  t  stages  remain,  it  is 
optimal  to  stop  if  at  most  t  —  1  stages  remain. 
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Lemma  1.   For  II  €  S  and  t  =  2,3,.. ., 

a.  Vt(U)  >  Kt-i(II) 

b.  IfVt(IL)  =  a(n),  then  Vt-i(U.)  =  a(II). 

PROOF:   (Part  b.)  Suppose  (b)  is  false.  Then  Vi_i(II)  >  a(U)  =  Vt(U),  which  contradicts 
Lemma  1-a. 


We  now  examine  how  optimal  payoffs  are  influenced  by  some  of  the  initial  proba- 
bilities that  records  are  in  the  file.  In  particular,  we  establish  the  intuitively  appealing 
results  that  in  Problem  k  the  expected  payoff  associated  with  declaring  rjt  to  be  in  the 
file  is  nondecreasing  in  pk  and  that  the  payoff  associated  with  declaring  rjt  to  be  in  gap 
I,  I  =  0, . . . ,  k  —  1,  is  nonincreasing  in  pk-  Before  doing  this,  however,  we  establish  some 
preliminary  results.  Let  H(pk )  denote  the  initial  state  of  the  decision  process  when  pk  is 
the  initial  probability  that  r;.  is  in  the  file.  Let  7r,(p^)  be  the  zth.  component  of  H(pk). 
Analogously,  II*(pjt)  denotes  the  distribution  at  the  beginning  of  period  t. 

The  next  results  are  used  to  establish  the  dependence  of  any  posterior  probability  on 
Pk-  For  notational  convenience,  let  Sc  =  {i  \  i  £  C,  i  €  Gi,  for  some  /  =  0, 1, . . .  ,  k  —  1} 
denote  the  complement  of  S. 

Lemma  2.  Fort  >  2, 

if  TTt-l.j  =  0 

if  TTt~i,i  >  0,  where  I  C  C. 

PROOF:  The  proof  is  by  induction  on  t.  Using  (1)  and  (2),  the  result  holds  at  t  —  2.  For 
II(_i ,  we  observe  the  occurrence  of  the  event  I  C  C  and  this  knowledge  is  used  to  construct 
n<.  Assume  that  the  result  holds  for  some  t  >  2  so  that 

f  r-^2 if  7Tt-l,t  >  0 

{  0  otherwise. 

where  I  C  C.   Suppose  we  examine  Box  j  and  observe  rq.   If  t^u  —  0,  then  'EJqi(Ilt)  =  0. 
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Suppose  that  7ttl  >  0.  Then,  from  (1)  and  (2), 

H„,(n()  = 


£, 

16  Hi, 

V 

m 

E. 

l£H}q 

n-i 

Erne/""- 

E/eH,-,^' 


and  the  result  holds  for  all  t. 


Lemma  3. 

a.  Fori  G  5,  Ki(pk)  =  Pk§^;. 

b.  ForieSc,7rl(pk)  =  -(l-pk)% 


d-Ki 


PROOF:   For  any  i  G  5,  7r,  =  Ktpk,  where  k,  >  0  is  the  product  of  some  combination  of  pj 
or  (1  —  Pj)  for  all  j  ^  k  and  does  not  depend  upon  pk .  Therefore 


and  (a)  is  established. 
Similarly,  for  i  G  Sc, 


—  Kt   — 
Opk  Pk 


ditj  _      -KiiPk) 

dpk  1  -  Pk 


so  that  part  (b)  also  holds. 

Using  these  facts,  we  can  now  establish  the  influence  of  pk  on  the  state  distribution. 

Proposition  3.  For  t  =  1,2, ... , 

(      nonde creasing  in  pk  if  i  G  S 
Kti{pk)  is  < 

[      nonincreasing  in  pk  if  i  G  Sc . 

PROOF:   The  proof  is  by  induction  on  t.   For  t  —  1,  the  result  follows  immediately  from 
Lemma  3  since  «;,  >  0  for  all  i.  Assume  the  result  holds  for  some  t  >  2. 

Suppose  that  an  examination  of  Box  j  reveals  record  rq.  Then,  using  Lemma  2,  the 
ith  component  of  E;g(ITt(pjt))  is 

*i(pk) 


Ejqi(Ut(pt))  = 


Tlm€Hiqirrn(pk) 
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if  irti(Pt)  >  0-  Therefore,  for  i  €  5  such  that  ^tt{Pt)  >  0, 
dZjqi(ILt(pt)) 


dpt 


m(ztijq 


E 


Pit 


C^/ 


l€HjqnS         Pk        ieHjqnS< 


£  u-po 


57T/ 

dpk 


dxt       (     dnl 
Pk 


dp 


E 


5-7T, 


dpk^  m^in    dP" 


I     v-^      ^7rm    1    ^       n        11        ^        V^        ^TT,         n 

=UL^j^pl[1-11-^,il^>0' 

where  the  second  equality  follows  from  Lemma  3.    The  inequality  follows  since  ^-  >  0 
from  the  induction  hypothesis  and  ^-  <  0  for  all  /  £  Hjq  D  Sc. 
For  i  £  Sc  such  that  Trtl{pk)  >  0, 


—  =  (7jq{Ut{pk)) 


dpi 


dpk 
diri 


Ed-Km 
"5 
dpk 


meH}l 


E    B-_     E    (1_w) 

l€H3qnS  yK        l€HjqnS< 

diTj      y^     ch_i_ 
dpk  lekns  dpk  4 


dpk 


- r-(l-pfc)« —     >       -x — 

opk  dpk     *-£     dpk 


where  the  inequality  follows  from  the  induction  hypothesis  that  p&-  <  0(>  0)  for  i  £  SC((E 
5"),  respectively.  The  result  therefore  holds  for  all  t. 

Since  the  expected  payoffs  associated  with  stopping  are  sums  of  TruiPk),  it  follows 
immediately  that  these  payoffs  are  also  monotonic  in  pk  ■ 

Corollary  3. 

i.   Ylies  ni(Pk)  is  nondecreasing  in  pk  and 
ii.   YlieG  Ki{Pk )  iS  nonincreasing  in  pk  for  I  =  0, . . . ,  k  —  1. 

We  conclude  the  discussion  of  the  characteristics  of  optimal  search  and  disposition 
strategies  by  examining  the  special  case  when  each  of  the  records  are  equally  likely  to  be 
included  in  the  file  and  when  that  common  probability  is  0.5. 
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5.2  Special  Case:  Problem  n  =  3  when  p,  =  0.5  for  all  i. 

We  now  examine  the  example  in  Section  3.3  when  k  =  3  and  pt  =  0.5  for  all  i.  In 
some  sense,  this  is  the  "'maximum  uncertainty"  case-all  records  are  equally  likely  to  be 
included  and,  since  this  is  Problem  3,  all  boxes  are  potential  candidates  for  search.  We  use 
this  example  to  illustrate  several  interesting  properties  of  optimal  search  and  disposition 
strategies. 

Table  4  below  shows  expected  payoffs  and  optimal  actions  for  several  knowledge  states 
(those  that  are  associated  with  uncertainty  regarding  the  status  of  r$  =  9).  The  cost  of 
searching  one  box  is  c  =  0.28.  The  column  headed  by  Box  i  contains  the  expected  payoff  if 
Box  i  is  examined  first  and  an  optimal  policy  is  followed  thereafter.  The  "Optimal  Action" 
column  indicates  the  optimal  first  action  as  a  function  of  the  state  of  the  decision  process. 


Knowledge 

State 

Payoff  Values 

Distr. 

Box 

Optimal 

No. 

1     2        3 

invalue 

gapvalue 

gap 

Boxl 

Box! 

Box3 

Action 

1 

-    - 

0.50 

0.25 

0.44 

0.51 

0.34 

Box  2 

2 

1    - 

0.50 

0.25 

0.30 

0.58 

0.51 

Box  2 

3 

3     - 

0.50 

0.50 

0.44 

0.72 

0.44 

Box  2 

6 

1     3 

0.50 

0.50 

0.44 

.044 

0.72 

Box  3 

22 

i          0 

0.33 

0.33 

0.44 

0.72 

0.44 

Box  2 

23 

3-9 

0.50 

0.50 

0.44 

0.72 

0.44 

Box  2 

24 

3     -        0 

0.50 

0.50 

0.44 

0.72 

0.44 

Box  2 

27 

-    3 

0.50 

0.50 

2 

0.44 

0.44 

0.72 

Box  3 

29 

-   0 

0.25 

0.50 

0.72 

0.44 

0.44 

Box  1 

33 

0      0 

0.25 

0.50 

0.72 

0.44 

0.44 

Box  1 

35 

0 

0.43 

0.29 

0.52 

0.56 

0.28 

Box  2 

TABLE  4.  Evaluation  of  Some  Posterior  Distributions 
For  convenience,  II,-  will  refer  to  the  distribution  number  i  in  Table  4.    There  are 
several  observations  that  can  be  made  from  the  data  in  Table  4: 

1.  a(IIi)  =  0.5  >  or(Il35)  =  0.43,  where  II35  =  E3${Ili)  (i.e.,  II35  is  the  posterior  distri- 
bution when  IIi  is  the  prior,  Box  3  is  opened  and  is  found  to  be  empty).  Therefore, 


a(Ejq(Tl))  ?  a(Tl), 


(8) 


for  j,  q. 
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2.  Relation  (8)  illustrates  that  in  general  V(Hjg(II))  ^  V(II)  for  all  j  and  q.  In  particular, 
note  that  for  c  large,  testing  is  never  optimal  for  any  II  and  V(II)  =  oj(II). 

3.  When  there  is  no  information  regarding  the  contents  of  the  file  (i.e.,  IIi  is  the  current 
distribution),  it  is  optimal  to  examine  the  median  box  first  (Box  2).  A  more  detailed 
discussion  of  strategies  of  this  form  follows. 

4.  If  there  is  ever  uncertainty  regarding  the  status  of  r$  =  9,  it  is  optimal  search. 


Since  at  most  three  boxes  need  to  be  examined,  the  evaluation  of  the  explicit  optimal 
payoff  as  a  function  of  the  search  cost  is  straightforward.  The  fact  that  pt  =  1  also 
simplifies  the  computation  of  the  revised  probabilities  resulting  from  the  acquisition  of 
information.  Let  J3,(c)  be  the  expected  payoff  when  Box  i  is  searched  first  and  an  optimal 
policy  is  followed  thereafter,  given  that  c  is  the  search  cost.  Expressions  for  Bt(c)  follow 
when  the  initial  state  of  the  decision  process  is  IIi : 


Look  in  Box  2  first: 


B2(c)  =  I 


—  c 


+  -  max 
8 


|(1)         Declare  "In" 
|(1)         Declare  "Gap  2" 
— c  +  1     Examine  Box  3 


See  u3"  in  Box  2 


+  -(1)     See  "9"  in  Box  2 

8 


+  -  max 


1(1)  "In" 

1(1)  "Gap  0" 

|(1)  "Gapl" 

k  —  c  +  1  Examine  Box  1 


>ee 


in  Box  2 
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Look  in  Box  1  First: 


—  c 


1 


+  g(l)     See"0" 


(  \     Declare  "In" 

|     Declare  "Gap  1" 

|     Declare  "Gap  2" 

-c+i(l)     See  "9" 
j_ 

2 
1 


'In' 


+  2  max      2 


+  -  max  < 


Bi(c)  =  { 


+  -  max 

8 


"Gap  2"     Look  Box  2 
— c  +  1     Look  Box  3 
+  i(l)     See  "99" 
-c+±(l)     See  "9" 

"In" 

"Gap  0"     Look  Box  3 
"Gap  1" 
k  -c  +  1     Look  Box  2 
"In" 

"Gap  1"  See  "3" 

-c  +  1     Look  Box  2 


See  "1" 


+  7  max  < 

4 


/  I 
3 
1 
3 
1 
3 


+  -(1)     See  "9' 
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Jook  in  ] 

3ox  3  First 

<     —  r 

c 

'  f  (1)     Declare  "In" 
i(l)     Declare  "Gap  0" 
i(l)     Declare  "Gap  1" 
i(l)     Declare  "Gap  2" 

f  i               "In" 

4 

-c+  f(l)  +  1(1)  +  j  max< 

\              "Gap  1" 

I              "GapO"            L°°kBOX2 

Bz(c)  =  < 

7 

+  -  max  < 

8 

f* 

k  — c  +  1     Look  Box  1 

"In"                  See  "99"  in  Box  3 

-c+  i(l)  +  |max^  |              "Gap  1" 

(  -c  +  1     Look  Box  2 

'  \              "In"               Look  Box  1 

+ 1  max  < 

\              "Gap  1" 
|               "Gap  2" 
k  —  c  +  1     Look  Box  2 

+K1) 

+  5(!)     S 

ee  "9"  in  Box  3 

The  dynamic  program  is  now  written  in  terms  of  the  Bt(c)  functions: 

i(4)        Declare  "9"  to  be  in  the  file 


I 

8 

i(2; 


Bi(e) 

B2(c) 
I  B3(c) 


(9) 


Declare  Gap  "0" 

Declare  Gap  "1" 

Declare  Gap  "2" 

Look  in  Box  1  first 

Look  in  Box  2  first 

Look  in  Box  3  first 

Note  that  if  c  <  |,  then  — c  +  l>|>|>|so  that  it  is  always  optimal  to  continue  to 
search  if  a  search  action  has  ever  been  taken.  (In  each  of  the  "max"  functions  inside  B,(c), 
c  <  \  implies  that  the  expected  payoff  associated  with  searching  exceeds  the  expected 
payoff  when  a  stop  action  is  taken.)  Under  this  condition, 

Bi(c)=  -2c +1 

B2(c)  =  -^c+l 

19 
*s(c)  =  — g-c+1. 


22 


We  see,  therefore,  that  Z?2(c)  >  Bi(c)  >  Bs(c)  for  all  c  <  |.  If  it  is  optimal  to  look  in  a 
box,  it  is  optimal  to  examine  the  second  box  first  and  to  continue  to  search  until  the  status 
of  r3  =9  is  known  with  certainty.  Using  (9),  it  is  optimal  to  initiate  search  if,  and  only  if, 

B2(c)>  -&c<  -. 

In  the  example  in  Table  4,  c  =  .28  <  I,  so  the  properties  discussed  in  points  (3)  and  (4) 
are  special  cases  of  these  observations. 

The  optimal  search  and  disposition  strategy  for  Problem  3  with  pt  =  |  for  all  i  is 
summarized  as  follows:  If  c  <  I,  search,  beginning  with  Box  2,  until  the  status  of  r3  —  9 
is  known  for  certain;  otherwise,  stop  and  declare  that  r3  =  9  is  in  the  file. 

When  c  <  y,  seach  will  always  occur  if  there  is  some  uncertainty  regarding  the  status  of 
the  record.  It  is  easy  to  show  via  numerical  example  that  such  a  "search-while-uncertainty- 
persists1'  stategy  is  typically  not  optimal  for  arbitrary  values  of  pt.  When  n  =  3,  p\  =0.1, 
p2  =  0.3,  p3  =  0.8,  and  c  =  0.31,  and  the  problem  is  to  determine  the  status  of  r3,  the 
optimal  strategy  is  to  examine  Box  1  initially.  If  r\  is  observed  in  Box  1,  it  is  optimal  to 
stop  and  declare  that  r3  is  in  Gap  1  even  though  uncertainty  regarding  the  status  of  r3 
persists. 

The  optimal  search  pattern  in  the  "search-while-uncertainty-exits"  example  is  that  of 
a  "balanced"  search  tree  discussed  in  Knuth  (1973):  if  any  search  is  optimal,  it  is  optimal 
to  examine  the  median  box  first  and  eliminate  half  of  the  file.  If  the  object  is  not  found, 
continue  examining  the  median  box  of  those  boxes  that  have  not  yet  been  eliminated. 

It  a  clear  that  searching  the  median  box  depends  crucially  on  the  assumption  that 
pt  =  0.5.  Consider  the  following  variation  of  the  example:  Again,  the  objective  is  to 
determine  the  status  of  r3  =  9.  Assume  now  that  p\  =  p2  =  pz  —  p,  for  some  0  <  p  <  1, 
so  that  it  is  equally  likely  for  each  of  the  three  records  to  be  in  the  file.  Suppose  that 
c  =  0.01,  a  relatively  low  value  so  that  it  is  economical  to  examine  at  least  one  of  the 
boxes.  (Corollary  2  tells  us  that  there  could  be  search  costs  that  preclude  search.)  Which 
box  should  be  searched  first?  From  the  discussion  above,  we  conjecture  that  a  binary 
search  strategy  that  examines  the  middle  of  the  file— look  in  Box  2  in  this  case — is  optimal 
if  p  =  0.5.  This  strategy  is  not  optimal  for  all  values  of  />,  however.  When  p  is  near  one, 
for  example,  it  is  clear  that  if  a  box  is  to  be  examined,  it  should  be  Box  3.  (Numerical 
calculations  show  that  for  any  p  >  0.8,  it  is  optimal  to  examine  Box  3  first.)  On  the  other 
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hand,  when  p  is  near  0,  numerical  calculations  show  that  it  is  optimal  to  examine  Box  1 
first.  Here  is  where  the  ramification  of  the  sequential  storage  requirement  is  most  evident. 
When  p  is  near  0,  it  likely  that  there  are  few  records  in  the  file.  An  examination  of  the  first 
box  is  therefore  the  source  of  a  significant  amount  of  information.  Finally,  when  p  takes 
on  intermediate  values,  numerical  calculations  show  that  it  is  indeed  optimal  to  examine 
Box  2,  the  median  box,  first.  We  see  that  even  in  the  relatively  simple  case  where  the  p,'s 
are  equal,  it  can  be  difficult  to  predicate  where  to  begin  the  search.  The  problem  becomes 
even  more  complex  when  the  p^s  differ  across  records. 

6.  Summary 

A  problem  of  search  for  a  hidden  object  when  an  ordering  relation  holds  was  formu- 
lated as  an  infinite- horizon  Markov  decision  process  whose  state  space  is  a  continuum, 
consisting  of  the  space  of  probability  distributions  over  a  finite  set  of  core  states.  A  vari- 
ation of  a  procedure  for  computing  solutions  to  MDP's  with  a  finite  number  of  states, 
actions,  and  stages  was  used  to  compute  optimal  expected  payoffs  and  an  optimal  search 
and  disposition  strategy.  The  model  formulation  made  it  possible  to  derive  several  quali- 
tative characteristics  of  the  optimal  expected  payoff  function  as  well  as  the  optimal  search 
and  disposition  strategy. 
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Appendix 


Valuation  Algorithm 


invalue  :=  ^,^5  xl; 


gap  :  =  n;  gapvalue  =  —1000;  'Initial  Values 

for  /  :=  0  step  1  until  k  —  1  do 

x  :~  zlieG,  WH 

if  x  >  gapvalue  then 

gapvalue  :=  x;  gap  :=  /; 

next  /; 

if  t  =  k  then  boxvalue:=  — c; 

else 

bestbox  :=  0;  boxvalue  :=  —1000; 

for  j  :=  1  step  1  until  k  do 

sum  :=  — c; 

for  q  :=  j  step  1  until  n  +  1  do 

'g  indexes  records  in  Box  j 

Compute  Hjg(II)  using  (2); 

.sum  :=  Vt+i  (E!j9(II))  •  ajq  (II)  +  sum; 

next  q; 

if  sum  >  boxvalue  then 

boxvalue  =  sum;  bestbox  —  j; 

next  j; 

Vi(II)  :=  max{mua/ue,  gapvalue,  boxvalue}; 

end; 
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